-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add audbackend.backend.Minio #231
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files
|
|
||
""" | ||
path = path.replace(self.sep, "/") | ||
if path.startswith("/"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Python versions greater than 3.8 it would be safe to remove the conditional and replace it with. path = path.removeprefix("/")
. I believe that it is getting close to deprecating 3.8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When removing Python 3.8, there are other things we should update, e.g. if I remember correctly the typing
module might no longer be needed. I would prefer to first create a general issue to gather all the things we would like to update when removing Python 3.8, and then do this in dedicated pull requests.
Which means for now in this pull request we need to still support Python 3.8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now tracked in [tools/meta#12] internally.
audbackend/core/backend/minio.py
Outdated
cannot be closed. | ||
|
||
""" | ||
# At the moment, this is automatically handled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Automatically handled means by the minio client lib. Would it make sense to make this explicit?
The docstring is identical to the one in the base class Base
. Possibly the information that this is automatically handled should go into the subclass docstring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle, we can also take care about opening and closing by providing an http_client
object to MinIO, but as it will be very complicated to find out the best settings in that case, I would stay for now with the default and just state that closing is automatically handled by MinIO.
r"""Copy file on backend.""" | ||
src_path = self.path(src_path) | ||
dst_path = self.path(dst_path) | ||
# `copy_object()` has a maximum size limit of 5GB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 5GB limit is a limit of S3 native? This post suggests this at least. So the minio client uses s3n?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't know, I just found the information in the MinIO documentation. As we don't have a test for this, maybe we should also lower the value slightly to be on the safe side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably lowering would be ok to get it working right now.
The background is a different one: if MinIO has the limit of 5GB this suggests that the minio package is using the old s3n interface. Afaiu s3a succeeds s3n. The difference between is that s3n supports objects up to 5GB, but s3a should support 5TB and has improved performance. I cannof find the link right now, but I think I had seen that fsspec
has an implementation of s3a.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the size limit to 4.9 GB to be on the safe side.
* Add support for owner() * Be more conservative regarding owner
audbackend/core/backend/minio.py
Outdated
def _size( | ||
self, | ||
path: str, | ||
) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used by _copy_file
where it is used to determine whether file size exceeds 5GB. However I can see no cast in _copy_file
. Should the return type hint be integer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, it returns indeed an integer. I corrected the typing information.
backend.open() | ||
|
||
|
||
def test_get_config(tmpdir, hosts, hide_credentials): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tests parsing of the configuration, but does not have a one-line docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a docstring.
) | ||
def test_maven_file_structure( | ||
tmpdir, interface, file, version, extensions, regex, expected | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the test name is already telling, a single line docstring would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a docstring.
tests/test_interface_unversioned.py
Outdated
@@ -91,6 +91,7 @@ def tree(tmpdir, request): | |||
[ | |||
(audbackend.backend.Artifactory, audbackend.interface.Unversioned), | |||
(audbackend.backend.FileSystem, audbackend.interface.Unversioned), | |||
(audbackend.backend.Minio, audbackend.interface.Unversioned), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The interface parametrization seems to vary little between tests.
Would it make sense to recast this into a fixture? Possibly like this:
@pytest.fixture(params=[
(audbackend.backend.Artifactory, audbackend.interface.Unversioned),
(audbackend.backend.FileSystem, audbackend.interface.Unversioned),
(audbackend.backend.Minio, audbackend.interface.Unversioned),
(SingleFolder, audbackend.interface.Unversioned),
])
def interface(request):
return request.param
Probably local to test_interface_unversioned.py
would be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interface
was already a fixture, but I could solve the problem by a variable:
# Backend-interface combinations to use in all tests
backend_interface_combinations = [
(audbackend.backend.Artifactory, audbackend.interface.Unversioned),
(audbackend.backend.FileSystem, audbackend.interface.Unversioned),
(audbackend.backend.Minio, audbackend.interface.Unversioned),
(SingleFolder, audbackend.interface.Unversioned),
]
# ...
@pytest.mark.parametrize(
"interface",
backend_interface_combinations,
indirect=True,
)
def test_archive(tmpdir, tree, archive, files, tmp_root, interface, expected):
Reviewer's Guide by SourceryThis pull request implements a new backend for MinIO in the audbackend library. It adds support for MinIO storage, which is compatible with S3, and includes necessary changes to the existing codebase to integrate the new backend. The implementation follows a similar pattern to the existing Artifactory backend, with additional configuration options specific to MinIO. Class diagram for the new Minio backendclassDiagram
class Minio {
- MinioClient _client
+ Minio(host: str, repository: str, authentication: Tuple[str, str] = None, secure: bool = None, **kwargs)
+ get_authentication(host: str) Tuple[str, str]
+ get_config(host: str) Dict
+ close()
+ _checksum(path: str) str
+ _collapse(path)
+ _copy_file(src_path: str, dst_path: str, verbose: bool)
+ _create()
+ _date(path: str) str
+ _delete()
+ _exists(path: str) bool
+ _get_file(src_path: str, dst_path: str, verbose: bool)
+ _ls(path: str) List[str]
+ _move_file(src_path: str, dst_path: str, verbose: bool)
+ _open()
+ _owner(path: str) str
+ path(path: str) str
+ _put_file(src_path: str, dst_path: str, checksum: str, verbose: bool)
+ _remove_file(path: str)
+ _size(path: str) int
}
class Base {
<<abstract>>
}
Minio --|> Base
note for Minio "This class implements a backend for MinIO storage, compatible with S3."
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @hagenw - I've reviewed your changes and found some issues that need to be addressed.
Blocking issues:
Here's what I looked at during the review
- 🟡 General issues: 1 issue found
- 🔴 Security: 2 blocking issues
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
I addressed now all comments and updated the description by mentioning our Hetzner S3 test. |
Implements a backend for MinIO.
As MinIO is compatible with S3 storage, this might already work with other S3 storage as well. @ChristianGeng has tested this partly for a S3 instance at Hetzner, which worked with the exception that creation and deletion of buckets and empty files were not supported.
I decided to handle the authentication the same way as we do for
audbackend.backend.Artifactory
, by allowing to specify the username / password inside a config file or environment variable. In addition, we have the class methodaudbackend.backend.Minio.get_authentication()
to retrieve them, in order to check their values.For local MinIO servers we also need to set the
secure
argument toFalse
. I added this to the config file as well, in oder to make it part of the backend configuration, and added the class methodaudbackend.backend.Minio.get_config()
, which returns all entries as dictionary.In certain edge cases you might want to add other arguments to the underlying
minio.Minio
class. For that reason I decided to add support for**kwargs
, but do no checks if there is any conflict with authentication as I think this is only for power users anyway.Summary by Sourcery
Add support for MinIO as a new backend in the audbackend library, enabling storage operations with MinIO. Refactor tests to accommodate the new backend and ensure robust testing of its functionality.
New Features:
Enhancements:
Tests: